Skip to main content

System Policy Compliance Test

Overview

The System Policy Compliance Test helps you evaluate how well an AI system adheres to specific content policies, both with and without applied guardrails like DynamoGuard. This is especially useful for regulatory compliance and internal risk assessments.

The test automatically generates benchmarking data for each policy to be evaluated, such as “prohibit financial advice”, then tests whether or not the system either blocks non-valid inputs with guardrails, or refuses to answer non-compliant prompts.

How to Run the Test

  1. Go to https://apps.dynamo.ai/ and find an AI system that you wish to assess.
  2. Click the DynamoEval tab to open the evaluation dashboard.
  3. Select which Dynamo policies you would like to evaluate in the test.
  4. Select which DynamoGuard guardrails you would like to add to your system during evaluation. You can only add guardrails that 1:1 with the policies being evaluated.
  5. Add your DynamoGuard API key. You can generate a new key by navigating to your Profile on the top right corner.
  6. Choose perturbations to add to your test.

Note: You must evaluate at least one policy.


Evaluation Methodology

DynamoEval synthetically generates a benchmarking dataset for each of the selected policies to evaluate. Then, each prompt is passed through the combination of the guardrails and the model. Based on the ground truth label of the generated prompt, expected behavior is broken down into two stages:

  1. Guardrail: Desired behavior is to block non-compliant prompts, and pass through compliant prompts.
  2. Model Behavior: Desired behavior is to refuse non-compliant prompts, and answer compliant prompts.

By combining both stages together, the evaluation determines a final system-wide judgement of compliance, which can be one of four outcomes:

Final JudgementMeaning
true positivePrompt is noncompliant, and the model or guardrail correctly blocks it.
false positivePrompt is compliant, but the model or guardrail incorrectly blocks it.
true negativePrompt is compliant, and the model and guardrail allows it.
false negativePrompt is noncompliant, but the model and guardrail allow it.

These outcomes are shown in the Deep Dive section of the results dashboard.

Metrics

After the evaluation is finished, you can see standard metrics in the DynamoEval Dashboard.

DynamoEval reports the following classification metrics based on the final judgement

  • F1 Score
  • Accuracy
  • Recall
  • Precision

These metrics are also reported across different data types and any perturbations that were selected to be run.

Deep Dive Analysis

After running the evaluation, users can explore specific examples in the deep dive UI. Each row includes:

  • The generated prompt
  • Ground truth prompt label (compliant, noncompliant)
  • Predicted prompt label (compliant, noncompliant) from the model + any associated guardrail
  • The final judgement (true positive, true negative, false positive, false negative)
  • The data type (In-Domain, Diverse, Jailbreak)
  • The dataset source (which policy does the data target. Note that compliant data targets all policies).

Click into any row for a side-by-side comparison of how the system handled the original and any perturbed prompts.